D.18 Scout 2008 v1.00.01
Approximate Cost: Free
Source USEPA
Current Version: 2008 v 1.00.01 (2009)
Operating System Needs: Windows 98 or newer
Software Needs: Microsoft .NET version 1.1 Framework
Input Structure: Microsoft Excel spreadsheet (XLS) or comma-separated values (CSV) file
Overview
Scout was developed by Lockheed-Martin under contract with the USEPA. This program is a comprehensive, public domain data analysis software package that performs statistical methods used for evaluating data sets for groundwater monitoring optimization, backgroundNatural or baseline groundwater quality at a site that can be characterized by upgradient, historical, or sometimes cross-gradient water quality (Unified Guidance). contaminant evaluations, and risk analysis for quantifying cleanup criteria.
Scout was designed to allow practitioners and decision makers to learn and apply accepted statistical techniques. This tool allows you to incorporate qualitative considerations into the analysis and output may be adjusted based on user-specified considerations and rationales.
In addition, two stand-alone software packages, ProUCL4.0 and Parallax, are incorporated into Scout. ProUCL 4.0 (2009) has been updated to ProUCL 5.0 (2013) which is not part of Scout 2008 v 1.00.01.ProUCL 5.0 is a statistical software package useful for performing statistical evaluations including hypotheses testing and calculating upper limits of data sets with and without nondetect observations with multiple reporting limits. Parallax is a software package that offers graphical and classification tools to analyze multivariate data using parallax coordinates.
Four statistical modules are used with Scout, as described below.
The data module generates univariate data sets from normal, lognormalA dataset that is not normally distributed (symmetric bell-shaped curve) but that can be transformed using a natural logarithm so that the data set can be evaluated using a normal-theory test (Unified Guidance)., gammaA gamma distribution or data set. A parametric unimodal distribution model commonly applied to groundwater data where the data set is left skewed and tied to zero. Very similar to Weibull and lognormal distributions; differences are in their tail behavior, and the gamma density has the second longest tail where its coefficient of variation is less than 1 (Unified Guidance; Gilbert 1987; Silva and Lisboa 2007)., and uniform distributions, and multivariate data sets from normal distributions. The software can also perform transformation operations on univariate and multivariate data for data sets with and without nondetectsLaboratory analytical result known only to be below the method detection limit (MDL), or reporting limit (RL); see "censored data" (Unified Guidance).. The software handles nondetects through substitution and ROS methods, as well as estimation of missing observations.
The graphs module generates plots for single or grouped data sets. Graphical capabilities are described further in the Visualization section.
For univariate data sets, Scout can perform a variety of descriptive statistics, classical interval estimates, and goodness-of-fit tests. The software is capable of parametricA statistical test that depends upon or assumes observations from a particular probability distribution or distributions (Unified Guidance). and nonparametricStatistical test that does not depend on knowledge of the distribution of the sampled population (Unified Guidance). methods including Kaplan-Meier, ROS, and bootstrapA computerized method for assigning measures of accuracy to sample estimates. This technique allows estimation of the sample distribution of almost any statistic using only very simple methods. Bootstrap methods are generally superior to ANOVA for small data sets or where sample distributions are nonnormal (USEPA 2010). methods on left-censored dataValues that are reported as nondetect. Values known only to be below a threshold value such as the method detection limit or analytical reporting limit (Helsel 2005). sets. The software also performs univariate and multivariate outlier evaluations, robust estimation, single-sample and two sample hypotheses tests (including quantile tests), and the Wilcoxon-Mann-Whitney test.
The goodness-of-fit tests may be used to theoretically test and verify the normality of a data set and also to identify outliers. The goodness-of-fit tests may also be incorporated into the quantile-quantile plots along with correlationAn estimate of the degree to which two sets of variables vary together, with no distinction between dependent and independent variables (USEPA 2013b). coefficients and critical values for a specified significance level, α.
This module performs outlier identification and estimation for both univariate and multivariate data sets using both classical and robust methods. The univariate methods for uncensored and left-censored data sets include Dixon's test, Rosner's test, and Grubbs tests as well as Tukey’s robust biweight method. Multivariate outlier identification and estimation methods include but are not limited to: Max MD and multivariate kurtosisA measure of whether the data are peaked or flat near the mean. High kurtosis would show a distinct peak near the mean and drop off rapidly to heavy tails (NIST/SEMATECH 2012). sequential classical methods, iterative robust and resistant M-estimation methods based on Huber and PROP influence functions, MCD, and MVT method.
The module computes the various parametric (normal, lognormal, and gamma distribution based) and nonparametric (for example, bootstrap, central limit theorem) upper limits including confidence limit, prediction limit, tolerance limit, and simultaneous limit. Additionally, the software also computes parametric and nonparametric two-sided confidence intervals, prediction intervals, tolerance intervals, and simultaneous intervals.
This module can be used to perform one-way parametric and nonparametric ANOVAone-way analysis of variance. The trend tests option can perform trend evaluations on time-series data sets using the Mann-Kendall test and Theil-Sen nonparametric trend line, supplemented with graphical displays.
The regression module can perform classical and robust regressions using several linear methods including least medianThe 50th percentile of an ordered set of samples (Unified Guidance). of squared regression; least percentile of squared regression for classical methods; and M-estimation, Huber, biweight, and PROP influence for robust methods. In addition, the module also generates and displays prediction and confidence limits around fitted regression models for first order linear models. The graphical displays available within the module can be used to identify outliersValues unusually discrepant from the rest of a series of observations (Unified Guidance)., leverage points, and compare the performance of the various classical and robust regression methods.
|
Statistical Method |
Capability As Is |
Capability with Scripts/Add-Ins |
|---|---|---|
|
Handling of NDs |
|
|
|
● |
N/A |
|
|
● |
N/A |
|
|
● |
N/A |
|
|
● |
N/A |
|
|
Exploratory/Diagnostic Tools |
|
|
|
Summary Statistics |
● |
N/A |
|
● |
N/A |
|
|
● |
N/A |
|
|
Data transformations |
● |
N/A |
|
Statistical Design |
|
|
|
Statistical Power |
◒ |
N/A |
|
◒ |
N/A |
|
|
Contaminant ranking |
|
N/A |
|
|
N/A |
|
|
Statistical Limits |
|
|
|
● |
N/A |
|
|
● |
N/A |
|
|
● |
N/A |
|
|
Testing Compliance Limits |
◒ |
N/A |
|
Graphics |
|
|
|
Plots/Charts |
● |
N/A |
|
Batch plots |
● |
N/A |
|
Tweaking of graphics |
● |
N/A |
|
Statistical Comparisons |
|
|
|
● |
N/A |
|
|
◒ |
N/A |
|
|
Spatial Analysis |
|
|
|
Geostatistics/Mapping |
◒ |
N/A |
|
N/A |
||
|
N/A |
||
|
Regression/Time Series |
|
|
|
● |
N/A |
|
|
● |
N/A |
|
|
● |
N/A |
|
|
|
N/A |
|
|
● |
N/A |
|
|
● |
N/A |
|
|
Multivariate Analysis |
|
|
|
Multiple regression |
● |
N/A |
|
Factor/Discriminant analysis |
N/A |
|
|
● |
|
Capability Ratings:
N/A = Not applicable or not available
● = Full capability
◒ = Some capability
(blank cell) = No capability
Add-Ins Available
None
Ease of Use and Data Import
Scout 2008 v 1.00.01 is a user friendly software with User Guide providing details with examples on how to implement the various methods available in Scout 2008. Scout 2008 can read Microsoft Excel spreadsheet (XLS) or CSVcomma-separated values files. Output files generated by Scout can be saved in Excel spreadsheet or as Outlook offline storage table (OST) files.
Types of Distributions
Scout can perform statistical evaluations for normal, lognormal, and gamma distributed uncensored data sets and left-censored data sets consisting of nondetects with multiple reporting limits. Several robust, nonparametric, and bootstrap methods are also available in Scout 2008 v 1.00.01.
Visualization
Scout permits construction of plots for single and grouped data sets, including histograms, box plots, quantile-quantile plots, index plots, and 2D and 3D scatter plots. Scout can also generate univariate graphs for regression analyses including residual plots, observed versus predicted plots, and bivariate regression line plots. Results from other statistical modules may also be displayed on some of the graphs. Graphical displays are interactive, enabling a limited amount of editing of the graphs, which may be performed within the software, and plots may be exported as an image file or copied into other image editing software.
Several of the other modules within Scout can generate graphs in order to determine data set distributions, compare grouped data, identify outliers, compare performances of various methods for outlier identification, compare leverage points, and provide univariate and multivariate classical and robust method quality assurance and control.
All modules of Scout generate graphical output displays (GST file), Excel spreadsheets, or both graphical displays and spreadsheets. Some of the graphics generated include side-by-side box plots, histogramsGraphical representation of frequency with data values grouped into specified numerical ranges (Unified Guidance)., index plots, multiple quantile-quantile plots, interval graphs, control charts, and bivariate scatter plotsGraphical representation of multiple observations from a single point used to illustrate the relationship between two or more variables. An example would be concentrations of one chemical on the x-axis and a second chemical on the y-axis. They are a typical exploratory data analysis tool to identify linear versus nonlinear relationships between variables (Unified Guidance). of raw data.
Primary Uses for Groundwater Data Analysis
This software package performs statistical evaluations of data sets for groundwater or soil contaminants. Scout provides alternatives for sites at which practitioners must evaluate site contaminant cleanup criteria, perform trend analyses on groundwater contaminant monitoring data, or perform statistical evaluations on the data sets, including distribution assessments, outlier identification, and interval estimates to compute decision making statistics.
Benefits
- user friendly and offers easy presentation of results, which allows for interpretation and decision making
- capable of handling large data sets
Limitations and Data Requirements
Scout v1.00.01 (2009) is designed specifically for use in evaluation of soil and groundwater data and is not a general statistics package. There is no plan to upgrade the Scout software in the near future.
References
An overview and user guide is provided online and gives more detailed information on the capabilities of the software.
Publication Date: December 2013